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Quick Anomaly Detection by the 
Newcomb-Benford Law, with Applications 
to Electoral Processes Data from 
(N the USA, Puerto Rico and Venezuela 

^ Luis Pericchi and David Torres 

^ I Abstract. A simple and quick general test to screen for numerical 

^ ■ anomalies is presented. It can be applied, for example, to electoral 

ly-^ I processes, both electronic and manual. It uses vote counts in officially 

T— I ■ published voting units, which are typically widely available and institu- 

, tionally backed. The test examines the frequencies of digits on voting 

I counts and rests on the First (NBLl) and Second Digit Newcomb- 

^ ' Benford Law (NBL2), and in a novel generalization of the law under 

! restrictions of the maximum number of voters per unit (RNBL2). We 

"j^ ' apply the test to the 2004 USA presidential elections, the Puerto Rico 

■ (1996, 2000 and 2004) governor elections, the 2004 Venezuelan pres- 
'I idential recall referendum (RRP) and the previous 2000 Venezuelan 

■ Presidential election. The NBL2 is compellingly rejected only in the 
J> . Venezuelan referendum and only for electronic voting units. Our origi- 

I nal suggestion on the RRP (Pericchi and Torres, 2004) was criticized by 

^ ■ The Carter Center report (2005). Acknowledging this, Mebane (2006) 

ff^ i and The Economist (US) (2007) presented voting models and case stud- 

ly-^ ' ies in favor of NBL2. Further evidence is presented here. Moreover, 

C3 . under the RNBL2, Mebane's voting models are valid under wider con- 

^ I ditions. The adequacy of the law is assessed through Bayes Factors (and 

. . ' corrections of p-values) instead of significance testing, since for large 

_ ^ , sample sizes and fixed a levels the null hypothesis is over rejected. Our 

^ ' tests are extremely simple and can become a standard screening that 

^ ■ a fair electoral process should pass. 

Key words and phrases: Bayes Factors, election forensics, Newcomb- 
Benford Second Digit Law 2BL, Restricted Newcomb-Benford Law, 
electronic elections, p-value corrections, quick anomaly detection, uni- 
versal lower bound. 
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1. INTRODUCTION 

The Newcomb-Benford Law (NBL) postulates 
that the frequency of significant digits follow a dis- 
tribution quite different from the Uniform (see Tab- 
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les 1-2), as originally discovered by Newcomb (1881) 
and Benford (1938). 

Although the NBL works for any vector of signif- 
icance numbers, we will use the marginal and joint 
distributions of the first or second digits to check 
the law. Our goal is to develop methods for initial 
scrutiny of officially published electoral data. Offi- 
cial counts (published by the state electoral author- 
ity) are presented in quite variable levels of aggre- 
gation. We call an electoral unit" the officially re- 
ported, less aggregated data unit. The composition 
and size of these units vary widely in different elec- 
tions. The data may be aggregated at county lev- 
els (USA) or reported at an elementary polling unit 
when no aggregation is performed (Venezuela). If re- 
sults are reported from polling machines of around 
400 voters or fewer, the frequency distribution of the 
first digit of votes counts is heavily affected. On the 
other hand, the frequency of second digits should be 
less affected. That is why testing the second digit 
frequency, although less natural and less powerful 
than testing the first digit, is of wider applicabil- 
ity. Our main proposal is to check the second digit 
Newcomb-Benford Law NBL2 (also known as 2BL) 
or a variation of it by taking into account upper re- 
strictions RNBL2. However, in cases where the offi- 
cial data is aggregated, as in USA national electoral 
data, the first, and even the joint first and second 
distribution, fit the data extremely well; see Sec- 
tion 4. 

The Carter Center was one of the foreign institu- 
tions which oversaw the Venezuelan 2004 Presiden- 
tial Referendum, and was accepted monitor- 
ing external referee by both the government and the 
opposition; see http://www.cartercenter.org/ 
homepage . html. In the Carter Center Report (2005), 
pages 132-133, our novel suggestion to use the Sec- 
ond Digit NBL to scrutinize the Venezuelan 2004 
Referendum was criticized on the following 3 grounds: 
(1) The law is characteristic of scale invariant data 
with specific units, like centimeters or kilograms, 
so presumably it should not apply to elections and 
vote counts. The Newcomb-Benford Law has a sim- 
ple justification for numbers which have units, like 
weights, distances, temperatures, dollars or science 
constants, on which scale invariance apply; see, for 
example, Pietronero, Tosatti and Vespignani (2001). 

However, for unit-less data, like number of votes, 
a mathematically well grounded justification exists 
for using the law. It is based on a series of now clas- 
sical contributions by Hill (1995, 1996), that were 
summarized in Statistical Science. Hill establishes 



that NBL holds asymptotically if the numbers are 
generated as unbiased mixtures of different popula- 
tions, and the more mixing, the better the approx- 
imation. For example, if we generate numbers from 
a Normal distribution or from a Cauchy distribu- 
tion, NBL will be followed more closely in the lat- 
ter because the Cauchy distribution is a scale mix- 
ture of Normal distributions. Mixtures of Cauchy 
distributions may lead to an even better fit of NBL 
(Raimi, 1976). Reciprocally, if the NBL is rejected, 
then the vote counts are suspect of not being an 
unbiased realization of numbers sampled from mix- 
tures of distributions. How to implement this test 
is the subject matter of our method. (2) A second 
criticism was empirical: "First digit of precinct-level 
electoral data for Cook County, the city of Chicago, 
and Broward County, Fla. depart significantly from 
Benford's Law, primarily because of the relatively 
constant number of voters in voting precincts." But 
this criticism is about the distribution of first dig- 
its, and not the distribution of second digits. For low 
levels of aggregation of votes, we proposed the sec- 
ond digit distribution (or a generalization) , precisely 
because of the limits in the number of voters that 
produces ". . .relative constant number of voters in 
voting precincts." The second digit is far less sensi- 
tive to constant numbers of voters per polling unit. 

Compliance with the law based on the first digit is 
to be expected only for greater levels of aggregation, 
as, for example, in the USA 2004 election on which 
both the first and second digit laws show impres- 
sive fit; see Section 4.1. It should also be emphasized 
that the results toward NBL are asymptotical in na- 
ture, and we require a substantially large numbers 
of votes to claim a reasonably asymptotic situation, 
which, only perhaps for the Chicago data, can be 
claimed among the cases listed by the Carter Cen- 
ter Panel. From an empirical point of view, in this 
paper we show several elections (with larger data 
sizes) with good fit to NBL (see Section 4), where 
compliance with the law is the norm rather than 
the exception. There is a rapidly increasing num- 
ber of contributions in which compliance and vi- 
olations of the NBL have been presented for elec- 
toral votes; see Pericchi and Torres (2004), Mebane 
(2006, 2007a, 2007b), Torres et al. (2007) and But- 
torff (2008) among others. (3) A final criticism, rai- 
sed by the panel appointed by the Carter Center, 
was that under some (perhaps over simplistic) elec- 
toral models, computer simulations did not yield fre- 
quencies of second digits in accordance with NBL2. 
The fact that for some mathematical models NBL2 
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is not observed may also be regarded as evidence of 
the lack of realism of such models, and more sophis- 
ticated idealizations ought to be searched. In Tay- 
lor (2005, 2009) (who was part of the Carter Center 
Panel) a very intriguing and brief discussion is made 
of the Newcomb-Benford law regarding elections. 
The claim is made that the NBL is of "little use 
in fraud detection" for elections. However, the ra- 
tionalization covers only the first digit NBL and not 
the second digit. Data is simulated from models that 
can be criticized for not being realistic, since real- 
istic population voting models should not be homo- 
geneous on each electoral unit, but should be mix- 
tures of different populations (see next paragraph). 
The claim seems to be that the results of the simula- 
tions contradict NBL for the first, second and third 
digit laws. However, no measures of fit are provided, 
and intriguingly, the figures that cover the second 
and third digits have only 9 entries, although there 
are 10 second and third digits. (See Taylor, 2005, 
Figure 8, page 23, Technical Report version Novem- 
ber, 7, 2005). Furthermore, for the second digit at 
least, the fit of the votes for and against the govern- 
ment appear to be markedly different, a fact that is 
not discussed in the cited Technical Report. 

The negative criticism of the Carter Center Panel 
did not convince everybody. Acknowledging our orig- 
inal suggestion and the Carter Center Report, Wal- 
ter Mebane presented an invited conference at the 
Annual Meeting of the American Association for 
the Advancement of Science which was reported in 
The Economist (US) (2007), on which the suggestive 
term ^^Election Forensics" was coined by Mebane. 
He provides further support to the use of the 2nd 
digit NBL (calling it 2BL) for an initial quick scrutiny 
of elections based solely on officially reported data 
on the current election and does not require the use 
of covariates (Mebane, 2006). Mebane produced sim- 
ulations from realistic models of electorate behavior 
which are consistent with the 2nd digit NBL, and 
also presented different types of frauds that are de- 
tected by tests on the 2nd digit NBL (although not 
all frauds are detected). His models are an inter- 
esting reflection of political behavior, which are hi- 
erarchical mixed population models, denoted here 
HMPM. In these models there are two populations 
of voters at each polling station: the partisan popu- 
lation strongly in favor of a candidate and the gen- 
eral population, swinging between candidates. There 
was, however, a question about the general applica- 
bility of the 2nd digit Law: Mebane's models pro- 
duce frequencies according to NBL2 for some num- 



bers of voters per unit, say, 2000 or 3000 voters 
per electoral unit, but not for others, say, 2250. We 
introduced the Restricted Newcomb-Benford Law 
(RNBL) in Torres Niihez (2006), before being aware 
of Mebane's models. It turns out that the RNBL2 is 
consistent generally with Mebane's models, which is 
illustrated in Table 4. 

The NBLl has been utilized before to check, for 
example, tax fraud (Nigrini, 1995), and microarrays 
data corruption (Torres Niihez, 2006). Its use for 
elections is timely, since electronic voting is raising 
fresh concerns about the possibility of massive inter- 
ference with the digital data (Pericchi and Torres, 
2004). 

The official electoral data, when not presented 
with levels of aggregation, may have a small up- 
per bound, namely, the number of potential voters. 
In that respect, when necessary, we proceed in two 
ways: (1) Check the second digit number Law in- 
stead of the first, because the second digit is far less 
affected, if at all by restrictions on the total; (2) If 
(1) fails, try the restricted second digit law RNBL 
with realistic upper bounds; see next section. If both 
fail, then the alarm is on and further study is re- 
quired. 

The empirical general picture that emerges is that 
the fit of NBL is accepted in the elections in USA in 
Puerto Rico and in the manual elections in Venezuela. 
(In USA 2004, even the first digit and the more com- 
plex joint first and second digit test accepts NBL 
without restrictions) . Electronic voting in Venezuela, 
in the recall referendum, however, fails the test and, 
to some extent, in the previous presidential elec- 
tions, adding to the suspicions about electronic vot- 
ing, particularly without universal paper checking 
and audits, prior to the sending of the data to the 
central polling station. 

This paper is organized as follows: Section 2 is 
devoted to the description of the law and a gener- 
alization. Section 3 discusses different methods, al- 
ternative to the use of p-values to judge the fit of 
the models. Section 4 presents the data analysis of 
the USA, Puerto Rico and Venezuelan elections and 
Venezuelan recall referendum. Section 5 states some 
conclusions. 

2. OVERVIEW ON THE 
NEWCOMB-BENFORD FRAMEWORK 

Intuitively, most people assume that in a string of 
numbers sampled randomly from some body of data, 
the first nonzero digit could be any number from 1 
through 9, with all nine numbers being equally prob- 
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Table 1 

Newcomb-Benford Law for the first significant digit 



Digit unit 123456789 

Probability 0.301 0.176 0.125 0.097 0.079 0.067 0.058 0.051 0.046 



able. Empirically, however, it has been found that 
a law first discovered by Newcomb and later popu- 
larized by Benford is ubiquitous. 

For the first and second digit Newcomb-Benford 
Laws we have discrete probability distribution val- 
ues presented in Table 1 and Table 2, respectively, 
which are quite different from the Uniform Distri- 
bution. 

The most general probabilistic justification of the 
NBL is in Hih (1996). 

Hill developed the probability theory that justifies 
the asymptotic validity of the law for data such as 
people counts, which do not have units like grams 
or meters. 

The aim here is to use and generalize the New- 
comb-Benford Law in order to apply it to wider 
classes of data sets, particularly arising from elec- 
tions and to verify their fit to different sets of data 
with Bayesian statistical methods. 

The general definition of the Newcomb-Benford 
Law is stated here, on base 10, for simplicity. First 
we introduce the simpler laws for the first and sec- 
ond significant digits. Let Di,D2, ■ ■ ■ denote the sig- 
nificant digit functions. For example, 1)2(0. 154) =5 
gives the second significant digit: 

Pi (d-i) = Prob{Di = First significant digit = di) 



Iogio(l + 1M), di = l,2,..., 



P2 {^2) = Proh{D2 = Second significant digit = ^2) 

9 

= ^logio(l + l/(10i + a!2)), 



d2 = 0,l,...,9. 



For all positive integers A;, all di € {1,...,9} and 
dj G {0, 1, . . . , 9} for j = 2, . . . ,k the joint Newcomb- 



Benford distribution is 



Pi^,,,^k{di, ...,dk) = Prob{Di = di,...,Dk = dk) 



log 



10 



1 + 



In the remainder of this section we postulate the 
way in which the N-B Law acts under restrictions, 
when the number of electors per electoral unit is re- 
stricted to be smaller than a relatively small and 
known number K. This may be important when of- 
ficial data have not been aggregated. The notation 
used in the following discussion is: 

1. pf{di) is the Newcomb-Benford Probability Dis- 
tribution for the digit i and number di. These 
are presented in Table 1 and Table 2 for the first 
(i = 1) and second significant digit [i = 2) respec- 
tively. 

2. pf'{di) under the constraint N < K is the propor- 
tion of the numbers with ith-digit equals to di in 
the set of numbers that are smaller or equal to 
K, that is, pf{di) = ^^'^'^ , where ^di < K is the 
cardinality of numbers with ith-digit equal to di 
that are no bigger than K; 

3. pY {di) the proportion of numbers with zth-digit 
equal to di if no constraints were present. 

Note that if there is no restriction, then p^ = p^ . 
However, ii K = 800, for example, then for the first 

- 2) = 0.176 (see Table 1), 



significant digit, Pi{di 
l.f((ii = 2) = iii and pf(di = 2) = i. 



Definition 2.1. The Restricted N-B Law 
(RNBL) distribution is 

pf{d.)pndi)/vnd^) 



(2.1) Pi{di\N<K) 



E^Pfiddl^idd/Prid^ 



Table 2 

Newcomb-Benford Law for the second significant digit 



Digit unit 





1 


2 


3 


4 


5 


6 


7 


8 


9 


Probability 


0.120 


0.114 


0.109 


0.104 


0.100 


0.097 


0.093 


0.090 


0.088 


0.085 
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Table 3 

NBL for first and second digit with and without an upper restriction of 800 








1 


2 


3 


4 


5 


6 


7 


8 


9 


NBl 




0.301 


0.176 


0.125 


0.097 


0.079 


0.067 


0.058 


0.0512 


0.046 


CNBlsoo 




0.330 


0.193 


0.137 


0.106 


0.087 


0.073 


0.064 


0.006 


0.005 


NB2 


0T20 


0.114 


0.109 


0.104 


0.100 


0.097 


0.093 


0.090 


0.088 


0.085 


CNB2S00 


0.121 


0.114 


0.109 


0.104 


0.100 


0.097 


0.093 


0.090 


0.087 


0.085 



The heuristics behind the RNBL is as follows: 
sample from sets of numbers that obey NBL, but 
reject the number if and only if it does not obey 
the restriction. Note that if pf =pf ■, then the usual 
Newcomb-Benford Law (NBL) is recovered, whether 
there is a restriction or not. Take as an example the 
first digit law. If the numbers are restricted to be 
less than or equal to K = 9, there is no correction 
to the NBL. But if = 15, say, a substantial cor- 
rection applies. Note also that the restricted rule is 
also valid for lower bound restrictions of the form 
N > K or even for two sided restrictions. 

For positive numbers, there is a simpler expression 
for the equation above in terms of the cardinality of 
the sets induced by the restriction. It turns out that 
pY (d'j) = constant (the constant is equal to 1/9 for 
the first digit and to 1/10 for the second digit). This 
fact allows to cancel out pf in (2.1). Now let [jdj < K 
be the number of positive numbers less than or equal 
to K, with the ith-significant digit equal to dj. We 
may now simplify (2.1) as follows: 

canceling pf = c 

p^{d,)pf{d,) 

pf{d^mdi<K}/K 
Ed^pfidMd^<K}/K 

canceling K 

pf{d^Mdi<K} 

This is a simpler expression easier to calculate. 

Comment 1. In Table 3 we calculated the re- 
stricted law with an upper bound of 800. There it is 
seen that the first digit is more affected by the con- 
straint than the second digit, illustrating that the 



Table 4 

Table with an upper bound of N — 2250 voters, 
that illustrates the better fit of the restricted law, 
over m — 999 simulations 





m 


P{Ho\data) 


p- values 


P{Ho\data) 


No restrictions 


999 


0.9996 


0.001 


0.018 


Restrictions 


999 


1.0000 


0.802 


>0.5 



second digit NBL is of wider applicability than the 
first digit NBL. 

Mebane (2006, 2007a, 2007b) introduced realistic 
models (HMPM models) of electoral behavior that 
produced frequencies consistent with the NBL2 for 
some numbers of electors per unit, like 2000, but not 
for other such as 2250. 

Table 4 displays a large simulation with expected 
maximum number of voters of 2250 which shows 
the second digit RNBL to be more consistent with 
HMPM models than the usual second digit NBL, as 
anticipated. 

3. CHANGING P-VALUES TO NULL 
HYPOTHESIS PROBABILITIES 

The p-value is the probability of getting values 
of the test statistic as extreme as or more extreme 
than the value actually observed given that the null 
hypothesis is true. For the first significant digit, the 
observed chi-squared statistic X^observed given by 



(3.1) 



X Observed 



9 

Sample size x 

d=l 



{ProbjDi = d)-fd 
Prob{Di = d) 



where fd is the proportion observed of the digit d 
as the first significant digit. For the second signifi- 
cant digit, D2 = d £ {0, 1, . . . , 9}. This is the basis of 
a classical test of the null hypothesis which is that 
the data follows the Newcomb-Benford Law. If the 
null hypothesis is accepted, the data "passed" the 
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test. If not, a sort of inconsistency has been found 
which opens the possibihty of manipulation of the 
data. In the electoral process the null hypothesis 
is Hq : The data is consistent with the Newcomb- 
Benford proportions for the second significant digit 
(in Table 2), while the alternative Hi means that 
there is an inconsistency with the law. It is impor- 
tant to get a quantification of the evidence in fa- 
vor of the Null Hypothesis. In our case, if the data 
obeys Newcomb-Benford's Law, then the test offers 
no basis to suspect undue intervention in the elec- 
toral process. 

There is a well known statistical misunderstand- 
ing between the probability that the null hypothesis 
is true and the p- value. One general way to calibrate 
p-values is through the Universal Upper Bound, due 
to Sellke, Bayarri and Berger (2001). For a null hy- 
potheses, Hq, we have 

X^bserved I ^^^^ hypothesis is true) , 

where u is the degrees of freedom, which is equal to 8 
for the first significant digit and 9 for the second and 
onward. If the p-value is small (Ex. p-values < 0.05 
or less), it is assumed, based on uncritical practice 
and convention, that there is a significant result. 
But the p- value is not the probability that the sam- 
ple arose from the null hypothesis and, therefore, it 
should not be interpreted as a probability. The use- 
fulness and interpretation of a p- value is drastically 
affected by the sample size. 

A useful way to calibrate a p-value, under a Ro- 
bust Bayesian perspective, is by using the bound 
that is found as the minimum posterior probability 
of Hq that is obtained by changing the priors over 
large classes of priors under the alternative hypoth- 
esis. If a priori we have equal prior probabilities for 
the two hypotheses, P{Hq) = P{Hi) = 1/2, and for 
Pval < e"\ then 

P{Ho\P,al) 

(3.2) 

> 1/(1 + [-e-Pmr loge(?'™«)] 

A full discussion about this matters can be found in 
Sellke, Bayarri and Berger (2001). 

It is more appropriate to report the Universal Lo- 
wer Bound (3.2) than the p-value, with respect to 
the goodness of fit test of the proportions in the ob- 
served digits versus those proportions specified by 
the Newcomb-Benford Law. As we can see in Ta- 
ble 5, the correction is quite important. This table 
shows how much larger this lower bound is than the 
p- values. Small p- values (i.e., p^ai = 0.05) imply that 



Table 5 
p-values in terms of Hypotheses 
probabilities 



Pval 


P{Ho\data) 


0.05 


0.29 


0.01 


0.11 


0.001 


0.0184 



the posterior probability of the null hypotheses is at 
least 0.29, which is not very strong evidence to reject 
a null hypothesis. 

However, the lower bound correction does not de- 
pend on sample size, so for large sample sizes it 
can be very conservative. For a full correction of 
p-values, a Bayes Factor is needed, with the corre- 
sponding posterior probability of the null hypothe- 
sis. Next we compute a very simple Bayes Factor, 
based on a Uniform prior. 

3.1 Posterior Probabilities with Uniform Priors 

Let Ti = [1,2,...,8,9] and T2 = [0, 1, . . . , 8, 9]. 
The elements that may appear when the first digit 
is observed are members of Ti and if we observe the 
second digit or higher, the observations are members 
of T2. Let 




for k = 1, . . . ,9 in the case of the first digit and k = 
0, 1, . . . , 9 for the second digit. Then our hypothesis 
can be written as 

[6.6) n -o' 

where J^q means the complement of 0,q. In other 
words, 

Qq = {pi 7^ poi for at least one i E T}. 

As the simplest objective prior distribution assume 
an uniform prior for the values of the p'^s, then 

7r"(pi,p2) • • • ,Pk) = constant = T{k) 

(3.4) 

= (A;-1)!, 

which is the correct normalization constant, as it is 
seen from the well-known integral dpi ■ ■ ■ dpk^i = 

1/m. 
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We can write the posterior probability of Hq in 
terms of the Bayes Factor. Let x be the data vector, 
then the Bayes Factor is 



(3.5) 



°' P{Hi\^)P{Ho)- 

If we have nested models and P{Ho) = P{Hi) = i, 
then the Bayes Factor reduces to 



(3.6) 

where 
(3.7) 



01 



P(/7o|x) 



^(go|x) 
P(i7i|x)' 



^01 + 1 



For the ith significant digit, the data vector is n = 
(ni, n2, . . . , nfc), where rid is the frequency with 
which d is the ith significant digit in the data. Us- 
ing the definition of a Bayes Factor with a simple 
hypothesis, we have 



Bqi = f{ni,...,nk\Qo) 

/( / /(ni,...,nfc|r2o) 



•vr {pi,...,pk)dpi--- dpk^ij, 

with YlieTPi ~ 1 Pi ^ Vi € T. Substituting 
our assumptions. 



01 




k \ 

f{pT^^~Upi---dp,_A. 



After canceling factorial terms and using the iden- 
tity 

np-+^-idpi...dp, = ntir(n. + i) 

-oo 



+ 0O 



i=l 



r(n + k) 

we obtain a simplified expression for i?oi) 
(3.8) 5oi 



P10P20 ' ' 'PkO 



{k-l)\Ulinni + l)/r{n + k) 



To obtain the posterior probability using the Bayes 
Factor (using 3.7) and substituting Bqi, we get 

P{Ho\^) 

rii no ni. I [ n 1 m Oj=l -'^('^i "I" 1) A 



(3.9) 



rii 712 

P10P20 ' ' 'PkO 



,), ntir(n.+i) x Y 



In Torres Nunez (2006), calculations of posterior 
probabilities with several other priors and approxi- 
mations are presented. The conclusions are similar 
to those presented here. [See Berger and Pericchi 
(2001) for priors and approximations in Bayesian 
Models Selection]. 

4. RESULTS AND DATA ANALYSIS 

We illustrate the use of the First and Second digit 
Newcomb-Benford Law with data from the 2004 
USA elections, three elections in Puerto Rico and 
the Presidential Recall referendum in Venezuela and 
one previous Presidential election in that country. 
We denote by NBl and NB2 the analysis according 
to the first and second digit NBL, respectively. We 
show in the tables the value m which denotes the 
number of electoral units, and the median number 
of votes for the respective candidate on the infor- 
mation units. There is wide variation on the aggre- 
gation of the numbers, with the USA case as the 
most aggregate, and Venezuela the least aggregate. 
That is the reason why the first digit law is obeyed 
only in the USA, and the fit is remarkable. In most 
cases, the second digit law is also obeyed, without 
the need to use the restricted NBL. The case in 
which the NBL 2 was overwhelmingly violated is pre- 
sented by the Venezuelan Presidential recall vote. 
We attempted to mend it by restricting the Law for 
various plausible upper bounds, but the fit did not 
improve. 

4.1 United States Elections 2004 

The first case in point is the 2004 USA presidential 
election. Tables 6 and 7 and Figures 1-7. The data 
at the level of counties can be found at http:// 
us . cnn.com/ELECTI0N/2004/pages/results/. 

(Note: Nader's votes had to be constructed from 
alternative sources.) This is one of the best case 
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Table 6 
Summary USA 2004 Elections 



United States 2004 Min 1st Qu. Median Mean 3rd Qu. Max. 

Bush votes 2 1816 5047 18380 14130 1076000 

Kerry votes 3 973.2 3225.0 16840.0 9156.0 1908000 

Nader votes 1 13 31 143.7 85 13251 



Table 7 
USA 2004 Elections 



United States 2004 


m 


Median 


P{Ho\data) 


p- values 


P{Ho\data) 


NBl Bush votes 


4715 


3694 


1.000 


0.003 


0.050 


NBl Kerry votes 


4714 


2603 


1.000 


0.002 


0.034 


NBl Nader votes 


2822 


8 


1.000 


0.833 


> 0.5 


NB2 Bush votes 


4708 


3713 


1.000 


0.068 


0.331 


NB2 Kerry votes 


4698 


2621 


1.000 


0.651 


> 0.5 


NB2 Nader votes 


2271 


44 


1.000 


0.830 


> 0.5 



studies we know about the inadequacy of p-values 
when compared to the impressive fit of the NBL with 
both the first and the second digit, and even with 
the joint density of first and second digit. For ex- 
ample, in the case of Bush's votes, for the first digit 
the fit is excellent, but the p-value is only 0.003, 
significant even at 0.01 level. On the other hand, 
the absolute minimum of posterior probabilities of 
the null hypothesis is 0.05, over sixteen times the 
p-value. Note that this is only a lower bound over 



all possible prior distributions, which is certainly un- 
derstating the true evidence. Not surprisingly, a real 
Bayes Factor leads to a posterior probability of al- 
most one. 

The best fit is Nader's votes, which is not signif- 
icant, neither for the first or the second digit NBL, 
and so not surprisingly, the posterior probabilities 
of compliance with NBL is one. Bush's and Kerry's 
votes first digit tests are significant with small p- 
values, but the posterior probabilities are virtually 



United States Election Candidates first two digit proportions. 



Expected 

Empiricai Bush Votes 
Empiricai Kerry Votes 
Empiricai Nader Votes 




50 60 
Digits 



Fig. 1. Empirical distributions of the first two digits of the presidential candidates vs. N-B Law for the first two digits. 
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Newcomb-Benford First Digit Law and Bush votes. 



Newcomb-Benford First Digit Law and Nader vote. 




0.15 



0.05 



Fig. 2. Bush's digit proportions vs N-B Law for the 1st 
digit. 



0.35 



0.25 



Newcomb-Benford First Digit Law and Kerry vote. 




0.15 



0.05 - 



Fig. 3. Kerry's digit proportions vs. N-B Law for the 1st 
digit. 

one. For the second digit law the fit in all these cases 
is excellent. This is illustrated by Figures 1-7. 

4.2 Puerto Rico 

Here we show the data for the three main parties 
(PNP, PPD and PIP) in the 1996, 2000 and 2004 
elections for governor. The data can be found at 
http : / / elect ionspuert or ico . org/ datos/2004 
and http : //www . ceepur . org/ elecciones2000/. 

The results about the first digit are significant. 
Moreover, the posterior probabilities also reject the 
NBLl. The restricted NBL for the first digit does 
not show a big improvement either. This may be 
due to the fact that in electoral processes, the up- 




FlG. 4. Nader's digit proportions vs N~B Law for the 1st 
digit. 
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Newcomb-Benford Second Digit Law and Bush Vote. 



- 2"^" Digit Law 

- 2^^' Digit Bush 




2 3 4 5 

Digits 



Fig. 5. Bush's digit proportions vs. N-B Law for the 2nd 
digit. 

per bounds (the total number of electors per polling 
station) is not typically fixed across the population 
of polling stations. However, the second digit shows 
an excellent fit to the NBL2 Law, and the results 
with restrictions do not change much, illustrating 
again that the effect of bounds in the second digit 
is usually smaller than for the first digit NBL. 

4.3 Venezuela 

4.3.1 Referendum The 2004 Presidential Revoca- 
tory Referendum in Venezuela has attracted consider- 
able interest and controversy. (Data from the Refe- 
rendum can be found at http : // www . one . gob . ve, 
http : //www . Venezuela-referendum. com, https : // 
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Fig. 6. 
digit. 
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Newcomb-Benford Second Digit Law and Kerry Vote. 



■ 2 Digit Law 
- 2^' Digit Kerry 




Table 8 

Results of the 1996 Governor Elections in Puerto Rico 



4 5 
Digits 



Kerry's digit proportions vs. N-B Law for the 2nd 



Newcomb-Benford Second Digit Law and Nader Vote. 



- 2 Digit Law 

- 2*^' Digit Nader 




0123456789 
Digits 

Fig. 7. Nader's digit proportions vs. N-B Law for the 2nd 
digit. 

sites . google . com/a/upr . edu/probability-and- 
statistics/data-f iles-1, http : //esdata. inf o/ 
2004.) 

One of the most interesting features of this process 
is that it was partly manual and partly electronic, 
with the majority of the polling stations having elec- 
tronic voting, but a sizeable proportion being man- 
ual. Here, NO means in favor of the President and 
SI against. 

The most salient feature is that the electronic NO 
votes give evidence against NB2 Law. Figure 11 shows 
that the second digits seem to be Uniformly dis- 
tributed. This is not the case for manual votes, or 
for the SI electronic votes. This finding is quite in- 



Puerto Rico 
1996 


m 


P{Ho\data) 


p- values 


P{Ho\data) 


NB2 PNP 
NB2 PPD 
NB2 PIP 


1836 
1839 
1466 


1.000 
1.000 
1.000 


0.554 
0.138 
0.104 


>0.5 
0.426 
0.390 


Results 


Table 9 
/ the 2000 Governor Elections m 


Puerto Rico 


Puerto Rico 
2000 


m 


P{Ho\data) 


p- values 


P{Ho\data) 


NB2 PNP 
NB2 PPD 
NB2 PIP 


1823 
1878 
1579 


1.000 
1.000 
1.000 


0.979 
0.436 
0.450 


>0.5 
>0.5 
>0.5 


Results 


Table 10 
/ the 2004 Governor Elections m 


Puerto Rico 


Puerto Rico 
2004 


m 


P{Ho\data) 


p- values 


P{Ho\data) 


NB2 PPD 
NB2 PND 
NB2 PIP 


1924 
1917 
1402 


1.000 
1.000 
1.000 


0.154 
0.538 
0.822 


0.440 
>0.5 
>0.5 



formative: The electronic votes in favor of the gov- 
ernment need closer scrutiny. 

4.3.2 Venezuela 2000 For comparison purposes 
the Venezuelan presidential election of 2000 (the 
presidential election previous to the recall referen- 
dum of 2004) is presented here. (Data can be found in: 
https : //sites . google . com/a/upr . edu/probability- 
and-statistics/data-f iles-1, http : //esdata. 
inf o/downloads/ELECCI0NES2000 . zip.) 

Here none of the candidates for either manual 
or electronic show compelling evidence against the 
NB2 Law, although the winning electronic voting 
results in a posterior probability smaller than the 
others. Although to a lesser extent than in the 2004 
referendum, this result may indicate the need for 
a closer scrutiny of the winning electronic votes. 

5. CONCLUSIONS 

The main conclusions to be reached here are as 
follows: 

1. At a technical level: (i) the RNBL is a substan- 
tial generalization of the NBL that enlarges its 
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Newcomb-Benford 2"" Law and GOV-PNP 1996. 
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(a) Puerto Rico Elections 1996 PNP Party. 
Newcomb-Benford Second Digit Law and GOV-PPD 1996. 



0.14 




0.07 - 
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01 23456789 
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(b) Puerto Rico Elections 1996 PPD Party. 



Newcomb-Benford Second Digit Law and GOV-PIP 1996. 




Digits 

(c) Puerto Rico Elections 1996 PIP Party. 

Fig. 8. Puerto Rico 1996 Elections compared with the New- 
comb-Benford Law for the second digit. 



Newcomb-Benford Second Digit Law and GOV-PNP 2000. 
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(a) Puerto Rico Elections 2000 PNP Party. 
Newcomb-Benford Second Digit Law and GOV-PPD 2000. 
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(b) Puerto Rico Elections 2000 PPD Party. 
Newcomb-Benford Second Digit Law and GOV-PIP 2000. 
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(c) Puerto Rico Elections 2000 PIP Party. 

Fig. 9. Puerto Rico 2000 Elections compared with the New- 
comb-Benford Law for the second digit. 
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Newcomb-Benford Second Digit Law and GOV-PNP 2004. 
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(a) Puerto Rico Elections 2004 PNP Party. 
Newcomb-Benford Second Digit Law and GOV-PPD 2004. 
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(b) Puerto Rico Elections 2004 PPD Party. 
Newcomb-Benford Second Digit Law and GOV-PiP 2004. 
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(c) Puerto Rico Elections 2004 PIP Party. 

Fig. 10. Puerto Rico 2004 Elections compared with the New- 
comb-Benford Law for the first digit. 



Newcomb-Benford Second Digit Law and RR's Eiectronic NO Vote. 
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Fig. 11. Venezuela Revocatory Referendum Electronic NO 
Votes proportions. Venezuela Revocatory Referendum Elec- 
tronic Votes Proportions compared with the Newcomb-Ben- 
ford Law's proportions for Second digit. This is the only com- 
pelling rejection of the NBL2 law. 

Newcomb-Benford Second Digit Law and RR Electronic SI Vote. 
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Fig. 12. Venezuela Revocatory Referendum Electronic SI 
Votes proportions. Venezuela Revocatory Referendum Elec- 
tronic Votes Proportions compared with the Newcomb-Ben- 
ford Law 's proportions for Second digit. 

domain of applications. However, in the electoral 
processes presented here, the differences in the 
results with and without the restriction did not 
change much. This may be due to the fact that 
there is no constant upper bound, since the total 
number of electors is not the same for all polling 
stations. However, it is the case that the second 
digit law is far less affected by restrictions than 
the first digit law. (ii) The second digit NBL2 is 
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Newcomb-Benford Second Digit Law and RR Manuai NO Vote. 
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Fig. 13. Venezuela Revocatory Referendum Electronic NO 
Votes proportions. Venezuela Revocatory Referendum Manual 
Votes Proportions compared with the Newcomb-Benford Law's 
proportions for Second digit. 



Newcomb-Benford Second Digit Law and 2000's Eiection 
Winner Electronic votes . 
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Fig. 15. Venezuela 2000 Election Electronic Votes in favor 
of the Winner compares with Newcomb-Benford Law's pro- 
portions for Second digit. 



Newcomb-Benford Second Digit Law and RR IVIanuai SI Vote. 
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Fig. 14. Venezuela Revocatory Referendum Electronic SI 
Votes proportions. Venezuela Revocatory Referendum Man- 
ual Votes Proportions compared with the Newcomb-Benford 
Law's proportions for Second digit. 
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Fig. 16. Venezuela 2000 Election Manual Votes proportions 
in favor of Winner compares with Newcomb-Benford Law's 
proportions for Second digit. 



a useful test for quick detection of anomalous be- 
havior in electronic or manual elections, (iii) The 
Universal Lower Bound and even more so, Bayes 
Factors, are appropriate measures of evidence of 
the fit to the law, and p- values are not, particu- 
larly for large data sets like the electoral data. 
2. Regarding the detection of anomalies: (i) the USA 
2004 elections show a remarkable fit to the first 
digit Newcomb-Benford Law, and also to the sec- 
ond digit NBL. All the manual elections show 
support for the second digit NBL law. (ii) On the 



other hand, the electronic results of the votes in 
favor of the NO in the Recall Referendum violate 
the NB2 law. This is surprising, since the man- 
ual votes in favor and against, as well as the elec- 
tronic votes in favor of the opposition, fit the law 
reasonably well. In the previous 2000 Venezuelan 
presidential elections, there is no compelling ev- 
idence against the law, although again the elec- 
tronic results in favor of the winner show only 
about 13% of posterior probability in favor of the 
law. 
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Table 11 

Results of the 2004 Presidential Recall Referendum m Venezuela for electronic votes 



Venezuela RR m Median P{Ho\data) p-values P{Ho\data) 

No Electronic NB2 19064 263 0.000 0.000 0.000 

Si Electronic NB2 19063 172 1.000 0.024 0.196 



Table 12 

Results of the 2004 Presidential Recall Referendum in Venezuela for Manual votes 



Venezuela RR m Median P{Ho\data) p- values P(Ht)\data) 

No Manual NB2 4556 190 1.000 0.155 0.440 

Si Manual NB2 4379 76 1.000 0.003 0.047 



Table 13 

Results of the 2000 Election m Venezuela for electronic votes 



Venezuela 2000 m Median P{Ho\data) p- values P(Ho\data) 



Winner Electronic NB2 6876 486 0.129 0.000 0.000 

Runner up Electronic NB2 6872 265 1.000 0.017 0.160 



Table 14 

Results of the 2000 Election m Venezuela for manual votes 



Venezuela 2000 


m 


Median 


P{Ho\data) 


p- values 


P_{HQ\data) 


Winner Manual NB2 


3540 


103 


1.000 


0.366 


>0.5 


Runner up Manual NB2 


3219 


52 


1.000 


0.006 


0.081 



Newcomb-Benford Second Digit Law and 2000's Election 
Others Electronic votes . 
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Newcomb-Benford Second Digit Law and 2000's Election 
Others Manual votes . 
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Fig. 17. Venezuela 2000 Election Electronic Votes propor- 
tions of the Loser compares with Newcomb-Benford Law's 
proportions for Second digit. 



Fig. 18. Venezuela 2000 Election Manual Votes proportions 
of the Loser compares with Newcomb-Benford Law's propor- 
tions for Second digit. 



NEWCOMB-BENFORD LAW TO DETECT ELECTORAL ANOMALIES 



15 



Our methods, particularly the use of the Second 
Digit Newcomb-Benford Law, add to the increas- 
ing literature on measures of surprise and legitimate 
suspicion on electoral processes, particularly but not 
restricted to electronic voting. The NBL2, since our 
original suggestion in 2004, is becoming a standard 
tool on what has been termed by Mebane as "Elec- 
tion Forensics." 
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